Accuracy and repeatability of the Microsoft Azure Kinect for clinical measurement of motor function

Quantitative assessment of motor function is increasingly applied in fall risk stratification, diagnosis, and disease monitoring of neuro-geriatric disorders of balance and gait. Its broad application, however, demands for low-cost and easy to use solutions that facilitate high-quality assessment outside laboratory settings. In this study, we validated in 30 healthy adults (12 female, age: 32.5 [22 – 62] years) the performance and accuracy of the latest generation of the Microsoft RGB-D camera, i.e., Azure Kinect (AK), in tracking body motion and providing estimates of clinical measures that characterise static posture, postural transitions, and locomotor function. The accuracy and repeatability of AK recordings was validated with a clinical reference standard multi-camera motion capture system (Qualisys) and compared to its predecessor Kinect version 2 (K2). Motion signal quality was evaluated by Pearson’s correlation and signal-to-noise ratios while the accuracy of estimated clinical parameters was described by absolute and relative agreement based on intraclass correlation coefficients. The accuracy of AK-based body motion signals was moderate to excellent (RMSE 89 to 20 mm) and depended on the dimension of motion (highest for anterior-posterior dimension), the body region (highest for wrists and elbows, lowest for ankles and feet), and the specific motor task (highest for stand up and sit down, lowest for quiet standing). Most derived clinical parameters showed good to excellent accuracy (r .84 to .99) and repeatability (ICC(1,1) .55 to .94). The overall performance and limitations of body tracking by AK were comparable to its predecessor K2 in a cohort of young healthy adults. The observed accuracy and repeatability of AK-based evaluation of motor function indicate the potential for a broad application of high-quality and long-term monitoring of balance and gait in different non-specialised environments such as medical practices, nursing homes or community centres.


Introduction
Impairment of motor function can severely affect patients' mobility and quality of life [1]. Its clinical assessment plays a major role in the diagnosis and monitoring of disease progression performed in a single recording session per individual. Each participant gave written informed consent prior to study inclusion. The ethics committee of the medical faculty of the University of Munich approved the study protocol (034- 16) which was conducted in conformance with the Declaration of Helsinki.

Experimental setup and data acquisition
All measurements took place indoors in the research facility of the department of neurology of the University Hospital of Munich. The room was dimly lit to avoid interference with natural and artificial light. A marker-based motion capture system (Qualisys AB, Sweden) consisting of nine wall-mounted Oqus cameras that covered full-body motion within an area of approximately 6x5 m was used to provide a clinical reference standard for the validation or the two Kinect sensors (see Fig 1). The recording of the Qualisys system was set to 178 Hz, with an exposure time of 75 μs and a marker threshold of 17% to allow for a compromise between the largest possible field of view and the highest framerate. Extensive spatial calibration of the system was performed each day prior to measurements covering an area of roughly 6x5 m. Both Kinect sensors were placed at 1.4 m height onto a tripod, mounted on top of each other. The K2 was inclined by approximately -9˚in pitch plane to centre the recorded person. Raw data of both sensors were captured at a sampling rate of approximately 30 Hz using a customised software (Motognosis Labs v2.1.5 by Motognosis GmbH, Berlin, Germany) which utilised the Software Development SDK Version 1409 for the K2 and Version 1.4.0 with body tracking SDK 1.0.0 for the AK. Technical specifications of both Kinect sensors are given in Table 1. Data for both Kinect sensors were recorded in their respective raw formats, including infrared and depth streams (see S1 Fig).

Tasks/Assessments
In accordance with previous studies [10,17,18], each participant performed a set of motor tasks with three to five immediate, successive executions each. These tasks reflect movement sequences commonly examined during a neurological assessment of balance and gait, including a quiet stance test to measure static postural control (POCO), a stand-up and sit-down test (SAS) to measure postural transitions, as well as a stepping-in-place test (SIP) and a walking test with comfortable speed (SCSW) to measure locomotor function (see Table 2). Standardised instructions were provided prior to each task. When deviations in the performance occurred, the measurement was interrupted, discarded and the instructions were given again. Stationary tasks (POCO, SAS and SIP) were conducted at 2.5 m distance to the Kinect sensors to ensure an optimal resolution of the depth data. During the SCSW, participants were asked to start walking towards the Kinects from a position just outside the sensor range at 5.5 m distance. The SCSW task terminated automatically when participants reached 1.5 m distance to the Kinect sensors. The POCO and SIP tasks were terminated after a fixed duration of 40 s, whereas the SAS task was terminated manually. For each task, automated audio signals indicated the beginning and end of each recording.

Movement analyses
For the Qualisys recordings, light-weight passive IR reflective markers with 19 mm diameter were placed on 36 defined anatomical landmarks [10] to capture motion of all major body joints as well as the head and the trunk. The markers were fixated on the subject's tight-fitting clothing or bare Spatial and temporal alignment. Data processing was performed in MATLAB (v 2018b, The MathWorks Inc., Natick, Massachusetts, United States). To compensate for temporary loss of marker locations (flickering) in Qualisys recordings, gaps in the raw marker traces shorter than 56 ms were interpolated using polynomial spline interpolation. The anatomical landmarks of the AK and the markers from Qualisys were roughly mapped to represent landmark locations similar to K2 (as shown in the S2 Fig). This way, our results will be directly comparable to previous publications on K2 and AK accuracy.
All systems were first spatially aligned by normalizing the orientation of the coordination system (x, y and z) to describe the same dimensions. Subsequently, the tilt of the K2 sensor was compensated based on the floor normal vector provided by Kinect 2 SDK. Finally, the K2 and AK coordinate systems were rotated and translated to match the spatial orientation of Qualisys (see [10]). Next, all landmark trajectories obtained from the three systems were filtered by a 5 Hz first-order Butterworth low-pass filter. The temporal alignment between systems was achieved by down-sampling the Qualisys-derived time series to 30 Hz and then matching records by application of a temporal offset. Since some tasks like POCO yield small ranges of motion and gait tasks were non-stationary, cross-correlation alignment provided insufficient results. Therefore, offsets between systems were calculated based on system-specific timestamps and manually adjusted for each recording. After this temporal alignment, Qualisys recordings were cut at the beginning and end to match the Kinect sensor recordings.
Extraction of spatiotemporal parameters. To characterise motor performance during each task, a specific set of spatiotemporal clinical parameters was calculated for each task based on the mapped and aligned motion trajectories of the anatomical landmarks. In total, 23 different parameters were computed. Postural sway during the POCO task was analysed as the maximum range and average velocity of angular sway in anterior-posterior (AP), medio-lateral (ML) and 3D direction (the sway vector was defined as the extension of the spine base landmark relative to the mean position of both ankle landmarks) [17]. Sway parameters were calculated for the entire task duration including standing with eyes open and closed. Performance during the SAS task was quantified by the duration required to complete each transition (i.e., standing up and sitting down). Additionally, the range of movement of the shoulder spine landmark and the hands in AP direction was calculated to assess the amount of forward bending as well as potential compensatory movement strategies during both types of transitions [19]. Performance during the SIP task was characterised by a set of stepping features (i.e., cadence as the total number of steps per minute, average step and stance time) that are commonly used as surrogate markers for muscular weakness, hypokinesia or muscle fatigue [18]. In addition, the movement range was calculated as the mean amplitude of the knee landmark displacement in AP direction. Locomotor performance during the SCSW was characterised by calculating gait speed, the number of steps, the average step length, and cadence [19]. The amplitude and side asymmetry [20,21] of arm swing during walking was calculated based on the flexion extension angle between the shoulder spine landmark and wrist landmarks.
Statistical analysis of technical validity. In this work, the technical validity of the body tracking capabilities of the AK were analysed as (1) the accuracy of the movement signals, (2) the accuracy of derived spatiotemporal parameters, and (3) the repeatability of spatiotemporal parameters. All statistical procedures were performed in Python 3.8 using the packages 'pandas',' scipy', 'statsmodels', 'seaborn' and 'matplotlib'.
Accuracy and repeatability of movement patterns. The accuracy of derived spatiotemporal movement parameters was first described by providing descriptive statistics for each parameter and system as well as the absolute difference between parameters derived from each Kinect system against the Qualisys system. Absolute agreement between Qualisys against the two Kinect sensors was analysed using the intraclass correlation coefficients (ICC(A,1); twoway mixed model), whereas analysis of relative agreement that ignores potential systematic differences between systems was quantified by the Pearson's correlation coefficient. The repeatability of measures was determined by the ICC(1,1) (one-way random model) and the absolute and relative standard error of measurement (SEM).

Interferences between Qualisys and Azure Kinect
During the system setup, we noticed a varying degree of interferences between the infrared emitters of Qualisys and the AK but not with the K2. These interferences became mainly apparent in stronger noise in the depth stream, in particular for distances above 2.5 m from the sensor as well as depth frames where large areas of depth information could not be retrieved (see black areas in S3 Fig).
The amount of impaired depth frames was inconsistent and independent from measurement setup as well as framerate of Qualisys (see S1 File). Since all stationary tasks were conducted at a distance of 2.5 m, the resulting anatomical landmark models from the AK were found to be not affected by the interference from Qualisys. Unfortunately, the stable body tracking during the walking task was limited to distances below 3.5 m resulting in overall short gait paths (see S2 File). In comparison, the length of the gait path increased to 5.2 m after shutting down Qualisys.
From the total number of 480 performed recordings, 38 were excluded due to loss or covering of Qualisys markers (n = 19), failure of the AK SDK to extract anatomical landmarks (n = 9) and other technical errors (n = 10).

Accuracy of movement signals
The spatial accuracy of the anatomical landmarks from K2 and AK are reported for selected landmarks as average estimates across all motor task and task repetitions in the AP (z axis), ML (x axis) and vertical dimension (y axis) (see Table 3). Task-specific results for all anatomical landmarks are given in the S1 and S2 Tables.
The overall spatial deviation of anatomical landmark position derived from the Kinect sensors to the clinical reference standard marker locations was higher for AK (RMSE > 23 mm) compared to K2 (RMSE > 20 mm), and particularly high for both sensors' estimates of appendicular landmarks, i.e., the position of the feet, ankles, and hands. Correlation results were moderate to excellent for both AK and K2 with highest agreement to the clinical reference standard in AP direction and lowest accuracy in V direction.
Task-specific SNRs (see Table 4) revealed that movement signals derived from the AK showed similar or slightly lower accuracy with respect to all landmarks compared to K2. Lowest SNR values were observed for ankle, foot, and knee landmarks in particular during stationary tasks such as POCO and SAS. Higher levels of noise in AK landmarks were additionally confirmed by visual inspection. For larger movements, however, SNR of motion signals derived from AK was in general good.

Accuracy of spatiotemporal parameters
Based on the first measurement of each participant, descriptive statistics as well as relative and absolute agreement between K2 and AK against Qualisys are given in Table 5. Due to the very small SCSW recording area from AK (recognised walk length below 1.5 m), the recordings did not cover enough gait cycles to derive more sophisticated gait parameters. Only gait speed is reported here which is based on spine base landmarks and did not require gait cycle detection.
High to excellent relative and absolute agreement was found for spatial and temporal parameters from K2 and AK. The measurements of knee movement amplitudes during stepping in place showed only moderate absolute agreement for both sensors. Absolute agreement was overall slightly lower in AK then in K2, possibly due to small systematic differences. Blant-Alman plots for all parameters are provided in the S4 to S7 Figs.

Repeatability of spatiotemporal parameters
Repeatability of parameters were explored based on immediate repetitions of motor tasks performances (see Table 6). We found good to excellent repeatability for all three systems for most parameters except for measures of sway range in pitch and 3D during stance and arm swing asymmetry during gait. These measures showed also large SEM above 20%. All three systems exhibited a comparable repeatability with slightly lower outcomes for the AK.

Discussion
In this study, we evaluated the concurrent validity of the new Microsoft Azure Kinect1 sensor (i.e., AK) for its application in clinical assessment of motor function. A cohort of young healthy individuals performed different clinical tasks assessing static posture, postural transitions as well as locomotor function. The accuracy and reliability of AK for marker-less tracking of body movements during these tasks was validated against a marker-based motion capture system (i.e., Qualisys) and compared to its predecessor, the Kinect 2 (i.e., K2). Overall, AK exhibited a high accuracy and repeatability with respect to tracking of body landmarks and derived clinical outcome measures. Despite considerable differences in hardware and body  tracking algorithms between AK and K2, both sensors yielded similar performance and limitations in body tracking capability. In the following, we will discuss these findings with respect to the accuracy and limitations of AK for clinical motion tracking and potential fields of clinical applications.
When compared to the Qualisys clinical reference system, AK-derived movement signals yielded a moderate to high accuracy of body motion signals that depended on (1) the dimension of movement, (2) the body landmark location, and (3) the clinical motor task. Accordingly, motion signal accuracies were highest in the AP dimension and lowest for movements along the vertical dimension. Tracking of body landmarks was most precise for the head, trunk, and upper extremities but more or less compromised with respect to the lower extremities, in particular with respect to movements of the ankles and feet. Moreover, tracking accuracy depended on the specific motor task and exhibited limited precision with a high signalto-noise ratio (SNR) in the case of stationary landmarks, e.g., tracking of lower extremities during quiet stance. This limitation has been previously noted and interpreted as a specific difficulty of marker-less tracking approaches to differentiate the feet from floor in the case of close proximity [10,22]. Also, AK seems to show slightly higher levels of noise in vertical direction, especially during quiet stance. However, irrespective of body region and movement  dimension, we observed that AK yielded excellent accuracy in the case of large amplitude body motions. The observed dependency of AK tracking accuracy with respect to movement dimension and body location concurs with previous reports that focused on AK-based assessment of treadmill locomotion [14,15]. For each clinical motor task, we calculated a set of common spatiotemporal outcome measures based on the recorded body motion patterns. Overall, the AK-derived clinical outcome measures yielded a good to excellent relative and absolute agreement with the Qualisys clinical reference standard. Lowest, but still moderate, agreement was found for measures of static postural sway and timing while stepping in place. Differences in static sway measures (in particular in ML dimension) could be caused by a systematic spatial offset of the underlying spine base landmark between AK and Qualisys. The moderate agreement with respect to step timing is likely due to a discrepancy in the recognised number of steps between AK and Qualisys. Furthermore, almost all AK-derived spatiotemporal outcome measures showed a good to excellent consistency between repeated assessments of the same clinical motor task. Accuracy and repeatability measures presented for the K2 concur with previous publications [9][10][11][12][13].
Three studies recently demonstrated a high validity of the AK sensor in estimating spatiotemporal gait measures during overground and treadmill locomotion [14][15][16]. Our current observations extend these previous findings and demonstrate that the AK provides valid and reliable estimates of spatiotemporal outcome measures in young healthy adults that are commonly used in the clinical evaluation motor dysfunction. Quantitative assessments of static posture, postural transition, and locomotor function have been frequently shown to entail important information for a personalised fall risk estimation in neuro-geriatric patients and to reflect disease severity and progression of various neurodegenerative disorders [8,[23][24][25]. Hence, analogous to its predecessor K2 [18], the comparatively low equipment and personnel costs of AK may facilitate a broad application of reliable and high-resolution balance and gait assessment in various clinical and non-clinical environments such as outpatient clinics, medical practices, nursing homes or community centres. Furthermore, the observed consistency of AK-derived measures between repeated assessments may in particular facilitate an objective Table 6. Repeatability of spatiotemporal parameters derived from 3 to 5 repeated measurements as intra-class correlation coefficient ICC(1,1) and standard error of measurement (SEM %). monitoring of subtle disease or age related alterations of motor function in the long term. However, since the present validation experiments solely focused on a young, healthy population, subsequent studies are required to assess the validity and reliability of AK-based assessment of motor function in more heterogeneous study populations that are expected to show more variable movement patterns (eg, elderly people or persons with motoric impairments). Compared to its predecessor K2, which is no longer manufactured, the AK underwent major developments in particular with respect to its optical hardware and the utilised body tracking approach. While body pose tracking of the K2 is based on a random forest network trained on the sensor's depth images [26], the AK features both an improved resolution of the integrated depth camera and a refined body tracking approach that utilises convolutional neural networks based on recent developments in deep learning. These hardware and software changes could be expected to yield a more accurate body tracking performance. In line with these assumptions, AK tracking performance during treadmill locomotion was previously shown to surpass its predecessor K2, even though overall performance differences were only moderate to marginal [14,15]. In contrast, we found an overall similar performance between AK and K2 sensors, both with respect to the accuracy of landmark motion signals and the validity of derived clinical outcome measures. Moreover, tracking performance of AK was found to be even inferior to K2 at larger distances from the sensor. The latter can be explained by the presence of interferences between the clinical reference standard Qualisys motion capture system and the Kinect sensors that both operate in the overlapping regions of the IR spectrum. These interferences have been noted previously [27]: while Qualisys recordings appear to be not affected, the depth recordings of Kinect sensors becomes noticeably distorted at larger distances from the sensor. However, except for the gait task (SCSW), all other clinical tasks were performed in close proximity to the Kinect sensors and should therefore not be affected by Qualisys-induced distortions in the depth stream.

Q Mean (SD) K2 Mean (SD) AK Mean (SD) Q ICC (1,1) K2 ICC (1,1) AK ICC (1,1) Q SEM % K2 SEM % AK SEM
In conclusion, the presented validation experiments in young, healthy individuals demonstrate that AK is able to accurately monitor body movements and to provide reliable estimates of gait and balance capacity during a variety of motor tasks commonly performed in clinical assessment. Hence, AK represents a valid but seemingly not superior alternative for its predecessor sensor K2, which is no longer manufactured. Further studies are required to confirm the current observations for the application of AK-based motion analysis in different clinical cohorts.